noise, and sidetone paths, for a reasonably wide range of values of these
characteristics in any combination. Effects of some other phenomena can
also be approximately estimated, but are not yet incorporated in the model.
No attempt has yet been made to cater for features such as voice\(hyswitching
effects, or
vocoding and other sophisticated schemes for reducing information rate.
Compare the groups of factors listed in Question\ 7/XII\ [14].
.PP
The program CATPASS [16] \(em a mnemonic for COMPUTER\(hyAIDED TELEPHONY
PERFORMANCE ASSESSMENT \(em incorporated the same model in a simplified,
fixed\(hyparameter implementation, together with facilities for calculating the
sensitivity\(hyfrequency response of a complete connection formed by concatenating
common pieces of apparatus such as telephones, cables, feeding bridges,
junctions, and filters. It was similar to the system described in\ [17]
and\ [18], but the program was differently organized. However, CATPASS could
handle symmetrical connections only \(em that is, those for which transmission,
room noise, sidetone and all other relevant features were the same for both
participants. It was superseded by a program called CATNAP (COMPUTER\(hyAIDED
TELEPHONE NETWORK ASSESSMENT PROGRAM), which incorporated an extended form
of the fixed\(hyparameter model, allowing asymmetry in the connections,
as well as
containing facilities for assembling performance statistics on sets of
connections. See\ [19].
.PP
CATNAP has been superseded in turn by CATNAP83, in which three main
changes have been made:
.RT
.LP
a)
minor improvements to the subjective model;
.LP
b)
calculation of loudness ratings according to
Recommendation\ P.79, instead of the provisional version P.XXE\ [20] which
(notwithstanding the statement made in the earlier version of this
Supplement\ [21]) was used for calculating loudness ratings in CATNAP;
.LP
c)
introduction of more flexibility to allow parameters such as the earphone
coupling loss factor (\fIL\fR\d\fIE\fR\u) to depend on the particular type
of handset.
.sp 1P
.LP
2.3
\fISituation to be represented\fR
.sp 9p
.RT
.PP
Let A and B denote two \*Qaverage\*U participants in a telephone
conversation over a link terminated in handset telephones, located in rooms
with no abnormal reverberation and with specified levels of room noise.
\*QAverage\*U is intended to convey that the participants have representative
hearing and speaking characteristics and a normal attitude towards telephone
facilities, so that their satisfaction with the telecommunication link
may be measured by the
mean Conversation Opinion Score
(\fIY\fR\d\fIC\fR\u) and the
Percentage Difficulty
(%\fID\fR ) that would be obtained from a
conversation test, as described in Supplement\ No.\ 2. \fIY\fR\d\fIC\fR\ucan
take any value between\ 4 and\ 0, the scale being: 4\ =\ EXCELLENT, 3\
=\ GOOD, 2\ =\ FAIR, 1\ =\ POOR, 0\ =\ BAD. %\fID\fR can of course take
any value between 0 for the best connections and 100% for the worst.
.PP
For a given connection, the quantities of chief interest are
\fIY\fR\d\fIC\fR\u, %\fID\fR | and the speech level, for each participant.
However, other useful auxiliary quantities are computed in the course of
the evaluation, such as the loudness ratings of the various paths (calculated
according to
Recommendation\ P.79), and \fIY\fR\d\fIL\fR\\d\fIE\fR\u, the mean
Listening Effort
Score
that would result from a listening opinion test conducted as outlined in\
Supplement\ No.\ 2. In a listening test of this type, lists of sentences
at a standard input speech level are transmitted over the connection and
the
listener expresses an opinion, at a number of different listening levels, on
the \*Qlistening effort\*U according to the following scale:
.RT
.LP
\fIEffort required to understand the meanings of sentences\fR
.LP
A
Complete relaxation possible; no effort required
.LP
B
Attention necessary; no appreciable effort required
.LP
C
Moderate effort required
.LP
D
Considerable effort required
.LP
E
No meaning understood with any feasible effort.
.bp
.PP
The votes are scored A\ =\ 4, B\ =\ 3, C\ =\ 2, D\ =\ 1, E\ =\ 0, and
the mean taken over all listeners is called the Listening Effort Score,
\fIY\fR\d\fIL\fR\\d\fIE\fR\u, for each particular listening level and each
circuit
condition.
.PP
More detailed information about both conversation tests and listening tests
may be found in\ [22], and also in Supplement\ No.\ 2.
.RT
.sp 1P
.LP
2.4
\fIOutline of the model\fR
.sp 9p
.RT
.PP
The model requires the following inputs:
.RT
.LP
1)
overall sensitivity\(hyfrequency characteristic of each
transmission path (talker's mouth to listener's ear via the
connection) and sidetone path (each talker's mouth to his own
ear). These sensitivities may be either measured by the method
described in Recommendation\ P.64 or calculated as explained in
Reference\ [17];
.LP
2)
noise spectrum and level at each listener's ear, composed
of noise arising in the circuit, room noise reaching the
listening ear direct, and room noise reaching the listening
ear via the sidetone path. In the absence of specific
measurements, standard noise spectra and levels are taken;
e.g.\ room noise with Hoth spectrum at 50\ dBA, circuit noise
with bandlimited spectrum at a specified psophometrically
weighted level;
.LP
3)
average speech spectrum and average threshold of hearing,
as given for example in\ [23].
.PP
From these data the
loudness ratings
are calculated. With speech level fixed, \fIY\fR\d\fIL\fR\\d\fIE\fR\u |
and a provisional value of \fIY\fR\d\fIC\fR\uare
evaluated for each participant. The relationships between \fIY\fR\d\fIC\fR\uand
speech
level at each end are then used to refine the values of both, so that the
final estimates represent performance at realistic conversational speech
levels.
.sp 1P
.LP
2.5
\fICalculation of\fR
\fIloudness\fR \fIand\fR
\fIloudness\fR
\fIratings\fR
.sp 9p
.RT
.PP
The model starts by setting the speech level emitted from each
talker to a standard value and calculating the resultant spectrum and level
of both speech and noise at each listener's ear. The loudness of received
speech is calculated as a function of signal level, noise level and threshold
of
hearing, integrated over the frequency range extending normally from 179\ to
4472\ Hz (14\ bands, the lowest centred at 200\ Hz and the highest at 4000\
Hz).
The loudness of the sidetone speech is calculated similarly, but with an
allowance for the additional masking effect of speech reaching the ear
naturally (via the air path and the bone\(hyconduction path). By comparison
with the loudness of speech transmitted over an IRS (
Intermediate Reference
System
), the loudness ratings of the various paths are evaluated:
SLR
,
RLR
and
STMR
for each end, and
OLR
in each direction.
.PP
The method is described in [24], but is not given in detail here. The loudness
part of the model is important in its own right [for example in the
study of Question\ 19/XII\ [25]], but not closely connected with the rest
of the model. The program outputs loudness ratings calculated according
to
Recommendation\ P.79, but also calculates a set of loudness ratings according
to the earlier method\ [26] which are used for subsequent calculations.
.RT
.sp 1P
.LP
2.6
\fICalculation of\fR
\fIlistening effort score\fR
.sp 9p
.RT
.PP
This part of the model is intended to reproduce the result that
would be obtained from a Listening Opinion Test.
.PP
It has been found possible to estimate \fIY\fR\d\fIL\fR\\d\fIE\fR\u | by
a process
similar to those already well known in calculating loudness and articulation
score. An intermediate quantity, Listening Opinion Index (LOI), is first
calculated as follows. Each elementary band in the frequency range contributes
to LOI an amount proportional to the product
\fIB\fR `
\fI\fI\d\fIf\fR\u\fIP\fR \ (\fIZ\fR\d\fIf\fR\u),
where \fIB\fR `
\fI\fI\d\fIf\fR\uis a frequency\(hyweighting factor expressing the
relative importance of that elementary
.bp
.PP
band for effortless
comprehension, and
\fIP\fR is a growth function applied to the sensation level\ \fIZ\fR (which
has already been evaluated for the loudness calculation). The actual values
of the
frequency\(hyweightings differ somewhat from those used in loudness and
articulation calculations; the growth function is limited to the range\
0 to\ 1 as in articulation, but the form used is:
\v'6p'
.RT
.sp 1P
.ce 1000
\fIP\fR (\fIZ\fR ) = 10
\u
@ { fIZ\fR~+~3.8 } over { 0 } @
\d\ \ if \fIZ\fR < \(em11,
.ce 0
.sp 1P
.ce 1000
.sp 1
\fIP\fR (\fIZ\fR ) = 1 \(em 10
\u
@ { (em0.3 (\fIZ\fR~+~14) } over { 0 } @
\d\ \ otherwise.
.ce 0
.sp 1P
.PP
.sp 1
LOI is proportional to
\s16\(is
\s9\fIB\fR `
\fI\d\fIf\fR\u\fIP\fR (\fIZ\fR\d\fIf\fR\u)
d
\fIf\fR , but in practice the integral is replaced by a summation over
a number of bands (normally 14), within each of which \fIZ\fR\d\fIf\fR\uand
\fIB\fR `
\fI\fI\d\fIf\fR\uare reasonably constant, just as in the loudness
evaluation. The formula actually used is:
\v'6p'
.RT
.sp 1P
.ce 1000
LOI = \fIAD\fR @ pile { sum above \fIi\fR } @\fIB\fR `
\fI
\di\u\fR \fIP\fR (\fIZ
\di\u\fR )
.ce 0
.sp 1P
.LP
.sp 1
.LP
where
.LP
\fIB\fR `
\fI\fI\d\fIi\fR\u is the frequency weighting for the
\fIi\fR th\ band, (shown diagrammatically in Figure\ 2\(hy1),
.LP
\fIZ\fR\d\fIi\fR\u is the mean \fIZ\fR \ in the \fIi\fR \ th band,
.LP
\fIP\fR is the appropriate growth function (illustrated in
Figure\ 2\(hy2),
.LP
\fIA\fR is a multiplier depending on the received speech level,
with the value\ 1 for a small range of levels around the optimum
but decreasing rapidly outside this range (see Figure\ 2\(hy3 where the zero
abscissa now corresponds to OLR = 8\ dB (Recommendation\ P.XXE\ [20]) instead
of 4\ dB as previously),
.LP
\fID\fR is a multiplier depending on the received noise level
(ICN\(hyRLR) with a value decreasing slowly from\ 1 at negligible
noise levels towards\ 0 at very high levels (see Figure\ 2\(hy4).
.LP
.rs
.sp 22P
.ad r
\fBFigure\ 2\(hy1, p.\fR
.sp 1P
.RT
.ad b
.RT
.LP
.bp
.LP
.rs
.sp 24P
.ad r
\fBFigure\ 2\(hy2, p.\fR
.sp 1P
.RT
.ad b
.RT
.LP
.rs
.sp 24P
.ad r
\fBFigure\ 2\(hy3, p.\fR
.sp 1P
.RT
.ad b
.RT
.LP
.bp
.LP
.rs
.sp 19P
.ad r
\fBFigure\ 2\(hy4, p.\fR
.sp 1P
.RT
.ad b
.RT
.PP
Thus it is only for wide\(hyband, noise\(hyfree, distortion\(hyfree speech
at optimum listening level that LOI attains its maximum value of
unity.
.PP
The Listening Opinion Index is related to \fIY\fR\d\fIL\fR\\d\fIE\fR\u |
in a manner
which depends on the standard of transmission to which listeners have been
accustomed in their recent experience. It is found that the subjects' standard
of judgement is influenced mostly by the best circuit condition experienced
in the current experiment, or, in real calls, by the quality of the best
connections normally experienced. For example, a circuit condition which
earns a score of almost 4 in an experiment where it is the best condition,
would earn a score of perhaps only 3 if a practically perfect condition
were included in the same experiment, and about 3.5 if the best condition
in the same experiment were equivalent in performance to the best connection
that can normally occur in the British Telecom system. A parameter\ LOI
\s6LIM
.PS 10
,
introduced to cater for this effect, specifies the value of LOI that
corresponds to maximum \fIY\fR\d\fIL\fR\\d\fIE\fR\u; it is generally set
equal to 0.885
when connections are being judged against a background of experience with
the British Telecom network. The relationship in general terms is
\v'6p'
.RT
.sp 1P
.ce 1000
ln
@ left ( { fIY~\dLE~\u\fR } over { ~\(em~\fIY~\dLE~\u\fR } right ) @
= 1.465
@ left [ ln left ( { OI } over { OI~\dLIM~\u~\(em~LOI } right ) \(em~0.75~ right ] @
.ce 0
.sp 1P
.LP
.sp 1
as shown in Figure\ 2\(hy5. This brings us to the point where \fIY\fR\d\fIL\fR\\d\fIE\fR\u |
has been evaluated for each participant as a function of listening level\
\(em in
particular, at the listening level established for each participant when the
other speaks at Reference Vocal Level (RVL), defined in\ [27].
.sp 1P
.LP
2.7
\fICalculation of\fR
\fIConversation Opinion Score\fR
.sp 9p
.RT
.PP
In order to convert a value of \fIY\fR\d\fIL\fR\\d\fIE\fR\u | at the appropriate
listening level to the corresponding value of Conversation Opinion Score
(\fIY\fR\d\fIC\fR\u), it is necessary to take account of deviations of
mean vocal
level from RVL.
.PP
The symbol \fIV\fR\d\fIL\fR\u | is used to denote the electrical speech
level in dBV at the output of a sending end when the acoustic level at
the input (mouth reference point) is RVL. During conversation, a different
level (\fIV\fR\d\fIC\fR\u) will generally prevail at the same point, because
participants tend to raise
their voices if incoming speech is faint or poor in quality and to lower
them if incoming speech is loud. In other words, \fIV\fR\d\fIC\fR\uat end\
\fIA\fR depends on
\fIY\fR\d\fIL\fR\\d\fIE\fR\uat end\ \fIA\fR , which depends on \fIV\fR\d\fIC\fR\uat
end B, which depends on
\fIY\fR\d\fIL\fR\\d\fIE\fR\uat end \fIB\fR , which depends in turn on \fIV\fR\d\fIC\fR\uat
end \fIA\fR . Thus there is a circular dependence or feedback effect.
.bp
.RT
.LP
.rs
.sp 21P
.ad r
\fBFigure\ 2\(hy5, p.\fR
.ad b
.RT
.PP
The sidetone paths introduce complications when STMR < | 3\ dB
(besides contributing noise from the environment to the receiving channel as
already explained). Other things being equal, each talker's vocal level goes
down by almost 1\ dB for every 3\ dB decrease in STMR below 13\ dB, and this of
course further modifies the opinion scores and speech levels at both ends by
virtue of the feedback effect.
.PP
In addition to this, very high sidetone levels are experienced as
unpleasant \fIper\ se\fR , particularly when the connection is poor for other
reasons.
.PP
This complex interrelationship is found to be reasonably well
represented by the following equations.
.RT
.LP
\fIY\fR `
\fI
\dC\u\fR \ is an intermediate quantity explained
below.
\v'6p'
.ce 1000
ln
@ left ( { fIY\fR~`~\fI~\dC\u\fR } over { ~\(em~\fIY\fR~`~\fI~\dC\u\fR } right ) @
= 0.7
@ left [ ln left ( { fIY~\dLE~\u\fR } over { ~\(em~\fIY~\dLE~\u\fR } right ) ~+~0.5~\(em~ { (13~\(em~STMR) } over { 0 } left ( { ~\(em~\fIY~\dLE~\u\fR } over { fIY~\dLE~\u\fR } right ) $$2x2 right ] @
.ce 0
.ad r
(2\(hy1)
\v'7p'
.ad b
.RT
.ce 1000
.sp 1
\fIV
\dC\u\fR \(em \fIV
\dL\u\fR = 4.0 \(em 2.1 \fIY\fR `
\fI
\dC\u\fR \(em 0.3 K (13 \(em STMR)
.ce 0
.ad r
(2\(hy2)
.ad b
.RT
.LP
.sp 1
.ce 1000
ln
@ left ( { fIY~\dC\u\fR } over { ~\(em~\fIY~\dC\u\fR } right ) @
= 0.8451
ln
@ left ( { fIY\fR~`~\fI~\dC\u\fR } over { ~\(em~\fIY\fR~`~\fI~\dC\u\fR } right ) @
\(em 0.2727
.ce 0
.ad r
(2\(hy3)
\v'7p'
.ad b
.RT
.LP
.sp 1
where
.LP
K\ =\ 1 if STMR <\ 13,
.LP
K\ =\ 0 otherwise.
.PP
By substituting in equation (2\(hy1) the value of \fIY\fR\d\fIL\fR\\d\fIE\fR\u |
already found for end\ A \(em which would apply for \fIV\fR\d\fIC\fR\u\
=\ \fIV\fR\d\fIL\fR\uat end B\ \(em
one obtains a first approximation to \fIY\fR `
\fI
\dC\u\fR , then from
equation\ (2\(hy2) an approximation to \fIV\fR\d\fIC\fR\uat end A. The
earlier calculations are repeated with this speech level to find a new
value of \fIY\fR\d\fIL\fR\\d\fIE\fR\uat
end\ B, hence an approximation to \fIY\fR `
\fI
\dC\u\fR and \fIV\fR\d\fIC\fR\uat end\ B. This
process is repeated cyclically until each \fIY\fR `
\fI
\dC\u\fR converges to a
settled value, and then equations (2\(hy1) and (2\(hy2) are simultaneously
satisfied.
.PP
Figure 2\(hy6 shows the form of the resultant relationship between
\fIY\fR\d\fIL\fR\\d\fIE\fR\u | and \fIY\fR `
\fI
\dC\u\fR , for two different values of STMR,
with \fIV\fR\d\fIC\fR\uat its proper value. The transformation [equation
(2\(hy3)],
illustrated in Figure\ 2\(hy7, is then applied to the intermediate score
\fIY\fR `
\fI
\dC\u\fR , to give the estimated Conversation Opinion Score
\fIY\fR\d\fIc\fR\u, which is shown as a function of \fIY\fR\d\fIL\fR\\d\fIE\fR\uin
Figure\ 2\(hy8.
.bp
.RT
.LP
.rs
.sp 24P
.ad r
\fBFigure 2\(hy6, p.\fR
.sp 1P
.RT
.ad b
.RT
.LP
.rs
.sp 24P
.ad r
\fBFigures 2\(hy7 and 2\(hy8, p.\fR
.sp 1P
.RT
.ad b
.RT
.LP
.bp
.sp 1P
.LP
2.8
\fIEvaluation of other subjective measures of performance\fR
.sp 9p
.RT
.PP
Relationships have been developed for various dichotomies of the
opinion scale \(em such as proportion of votes greater than\ 2 (i.e.\ votes
\*QExcellent\*U or \*QGood\*U) \(em and for the percentage of positive
replies to the
\*QDifficulty\*U question (Supplement\ No.\ 2).
.PP
For example, percentage \*QDifficulty\*U is represented by the
equation
\v'6p'
.RT
.sp 1P
.ce 1000
ln
@ left ( { fID\fR } over { ~\(em~\fID\fR } right ) @
= \(em2.3
ln
@ left ( { fIY~\dC\u\fR } over { ~\(em~\fIY~\dC\u\fR } right ) @
.ce 0
.sp 1P
.LP
.sp 1
where
.LP
\fID\fR \ \(mu\ 100\ =\ %\fID\fR .
.PP
However, these relationships are satisfactory only for certain
kinds of degradation and are still under review.
.sp 1P
.LP
2.9
\fICorrespondence between calculated and observed values\fR
.sp 9p
.RT
.PP
For symmetrical connections, provided very high sidetone levels and very
high room noise levels are excluded, the model reproduces fairly well
the results of laboratory conversation tests carried out in the U.K. In the
most recent laboratory tests there is a tendency for speech levels and hence
opinion scores to be somewhat lower than those observed earlier, but the
relativities between circuit conditions are not much disturbed by this.
It is believed, but not yet fully established, that approximately the same
relativities hold good for other populations of subjects \(em in particular,
for the population of ordinary telephone users accustomed to the British
Telecom
system \(em even though different absolute values of scores may be obtained
from other populations of subjects or by using different experimental
procedures.
.PP
Comparatively few results are available from experiments on
asymmetrical connections, but such evidence as there is indicates that the
model predicts too much divergence between the two ends of the connection \(em
especially in respect of \fIV\fR\d\fIC\fR\u, less so in respect of \fIY\fR\d\fIC\fR\u.
It is proposed to introduce a feedback feature to reduce the divergence
between
the two \fIV\fR\d\fIC\fR\uvalues, but care will be needed not to reduce
the \fIY\fR\d\fIC\fR\u
divergence too far as a result of this. HRC\ 4 in Annex\ A gives an example of
CATNAP calculations for a set of connections with asymmetrical losses:
compare these predictions with Reference\ [30] there quoted.
.PP
Predictions of \fIY\fR\d\fIC\fR\u | and \fIV\fR\d\fIC\fR\u | from both
CATNAP83 have been
compared with the results of a number of conversation experiments conducted
in the U.K. since\ 1976. The degree of agreement is summed up in
Table\ 2\(hy1.
.RT
.ce
\fBH.T. [T2.3]\fR
.ce
TABLE\ 2\(hy1
.ce
\fBComparison of observed (O) and predicted (P) results for two
.ce
models\fR
.ps 9
.vs 11
.nr VS 11
.nr PS 9
.TS
center box;
cw(36p) | cw(60p) | cw(36p) | cw(24p) sw(24p) sw(24p) sw(24p) , ^ | ^ | ^ | c s | c
^ | ^ | ^ | c | c | c | c.
Program Types of connection No. of conversations Deviations (O\ \(em\ P)